class 4 - lecture notes
-----------------------

problems in bioinformatics (overview)


  - sequencing genomes:
	physical mapping, sequence assembly

    concrete problem: sequence assembly problem:
	given: fragments of a large DNA sequence (e.g., human chromosome)
		with overlaps (multiple coverage)
	want: entire sequence

	complicating factors:
	- computational complexity:
	  problem can be seen as variation of shortest common superstring problem
	  which is known to be NP-hard
	  (see, e.g., http://citeseer.nj.nec.com/kececioglu93combinatorial.html)
	- incorrect/missing nucleotides ion fragment data (1-10%)
	- sequence repeats
	
	solution strategy:
	- determine fragment overlaps
	- orient and approx layout fragments based on overlapss
	- use multiple seq alignment of fragments to infer sequence

	
	(see http://www.dkfz-heidelberg.de/tbi/bioinfo/PhysicalMapping/SeqAssembly/,
	**http://www.biostat.harvard.edu/~dscholte/ieeealgs.pdf )


  - interpreting genomic sequence data:
	sequence alignment, gene finding, 
	sequence annotation (intron/exons, promotors, specific rna genes),
	structure prediction (rna and proteins), 
	pattern discovery, classification, clustering

    concrete problem: rna secondary structure prediction problem
	given: an RNA sequence X
	want: secondary structure of X (base pairings in X)

	complications: 
	- factors determining secondary structure are complex 
		and not	entirely understood
	- secondary structure of X may not be uniquely determined
	- computational complexity


  - understanding the cell / organisms:
	understanding interactions between proteins, dna, rna;
	regulatory pathways and networks, simulations

    concrete problem: gene regulatory relationship inference
	given: expression profiles of two genes A,B
		(e.g., from microarray analysis)
	want: decide whether A activates B, B activates A,
		A inhibits B, B inhibits A, or 
		there is no (direct) regulatory relationship between A and B
	
	complications:
	- imprecision/limitations in measuring expression profiles
	- indirect / complex regulatory relationships


  - relations between organisms and evolutionary questions:
	phylogenetic trees, computational models of evolution, simulations

    concrete problem: phylogenetic tree inference
	given: homologous dna sequences from multiple species
	want: evolutionary tree relating these species

	complications:
	- errors in sequence 
	- complexity/quality of multiple sequence alignment
	- limited knowledge of evolutionary processes


  - biomolecular computing:
	models of biomolecular computing: e.g., p-systems
	dna computing, inverse folding, self-assembling structures

    concrete problem: design of self-assembling dna nanostructures
	given: desired structure formed by multiple strands of dna
	want: find sequences for each strand (building block)
		such that entire structure forms spontaneously

	complications: 
	- errors in self-assembly (undesired interactions between strands)
	- efficiency of self-assembly process


  - related aspects (often not counted as bioinformatics)
	- simple tools (simple motiv search, dna-rna-protein seq translations, ...)
	- computational models of organisms or biological systems
	(biocybernetics)
	- nature-inspired algorithms
	(genetic algorithms, neural networks, ant colony optimisation, ...)
	- artificial life
	(two flavours: life-like behaviour of artificial systems,
		(re-)design of biological organisms)

  - three views on bioinformatics: 
	- cs just provides tools for biologists / biochemists
	  problems: lack of biological knowledge can lead to weaknesses in tools
	    lack of cs understanding can lead to questionable application of tools
	- cs research motivated by biological problems
	  problem: lack of biological knowledge + overabstraction leads
	    to algorithms/results with limited/no biological application
	- synergistic, interdisciplinary research
	  problem: needs close collaboration of experts in cs and molecular biology,
	    broad common basis of cs and basic biological knowledge
	  but: high potential for innovative and intriguing research and applications


(last 15min of class: project group formation / informal discussion)